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ABSTRACT Butyrate-producing bacteria have recently gained attention, since they are important for a healthy colon and when 
altered contribute to emerging diseases, such as ulcerative colitis and type II diabetes. This guild is polyphyletic and cannot be 
accurately detected by 16S rRNA gene sequencing. Consequently, approaches targeting the terminal genes of the main butyrate- 
producing pathway have been developed. However, since additional pathways exist and alternative, newly recognized enzymes 
catalyzing the terminal reaction have been described, previous investigations are often incomplete. We undertook a broad analy- 
sis of butyrate-producing pathways and individual genes by screening 3,184 sequenced bacterial genomes from the Integrated 
Microbial Genome database. Genomes of 225 bacteria with a potential to produce butyrate were identified, including many pre- 
viously unknown candidates. The majority of candidates belong to distinct families within the Firmicutes, but members of nine 
other phyla, especially from Actinobacteria, Bacteroidetes, Fusobacteria, Proteobacteria, Spirochaetes, and Thermotogae, were 
also identified as potential butyrate producers. The established gene catalogue (3,055 entries) was used to screen for butyrate 
synthesis pathways in 1 5 metagenomes derived from stool samples of healthy individuals provided by the HMP (Human Micro- 
biome Project) consortium. A high percentage of total genomes exhibited a butyrate-producing pathway (mean, 19.1%; range, 
3.2% to 39.4%), where the acetyl-coenzyme A (CoA) pathway was the most prevalent (mean, 79.7% of all pathways), followed by 
the lysine pathway (mean, 1 1 .2%) . Diversity analysis for the acetyl-CoA pathway showed that the same few firmicute groups as- 
sociated with several Lachnospiraceae and Ruminococcaceae were dominating in most individuals, whereas the other pathways 
were associated primarily with Bacteroidetes. 

IMPORTANCE Microbiome research has revealed new, important roles of our gut microbiota for maintaining health, but an un- 
derstanding of effects of specific microbial functions on the host is in its infancy, partly because in-depth functional microbial 
analyses are rare and publicly available databases are often incomplete/misannotated. In this study, we focused on production of 
butyrate, the main energy source for colonocytes, which plays a critical role in health and disease. We have provided a complete 
database of genes from major known butyrate-producing pathways, using in-depth genomic analysis of publicly available ge- 
nomes, filling an important gap to accurately assess the butyrate-producing potential of complex microbial communities from 
"-omics"-derived data. Furthermore, a reference data set containing the abundance and diversity of butyrate synthesis pathways 
from the healthy gut microbiota was established through a metagenomics-based assessment. This study will help in understand- 
ing the role of butyrate producers in health and disease and may assist the development of treatments for functional dysbiosis. 
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Butyrate-producing bacteria are widespread and can be found 
in many environments (1) but especially in host-associated 
sites, including the rumen (2), the mouth (3), and the large intes- 
tine (4). Recently, butyrate gained attention, because of its pro- 
posed key role in maintaining gut homeostasis and epithelial in- 
tegrity, since it serves as the main energy source for colonocytes, 
directly influences host gene expression by inhibiting histone 
deacetylases, and interferes with proinflammatory signals, such as 
NF-kB (5, 6). A breakdown of epithelial integrity is associated 
with emerging diseases such as inflammatory bowel diseases and 
type II diabetes (7, 8), and butyrate-producing members specifi- 
cally are reduced in such patients (9, 10). 

Butyrate producers form a functional cohort rather than a 



monophyletic group, and members of Lachnospiraceae and Rumi- 
nococcaceae have received the most attention because they are very 
abundant in the human colon, comprising 10 to 20% of the total 
bacteria. Butyrate is synthesized via pyruvate and acetyl- 
coenzyme A (CoA), mostly by the breakdown of complex poly- 
saccharides (e.g., starch and xylan) that escape digestion in the 
upper gastrointestinal tract and reach the colon (11). Alternative 
substrates, particularly those derived from cross-feeding with 
other primary degraders and lactate-synthesizing bacteria, are de- 
scribed as well (12). Acetyl-CoA is then converted to the interme- 
diate butyryl-CoA in a four-step pathway closely related to the 
j8-oxidation of fatty acids in prokaryotes and eukaryotes (13, 14). 
It is postulated that butyrate producers can conserve energy dur- 
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ing the conversion from crotonyl-CoA to butyryl-CoA, which cre- 
ates a proton motive force via ferredoxin reduction by the butyryl- 
CoA dehydrogenase electron-transferring flavoprotein complex 
(15). The final step from butyryl-CoA to butyrate is either cata- 
lyzed by butyryl-CoA:acetate CoA transferase (encoded by but) 
or butyrate kinase (encoded by buk; after phosphorylation of 
butyryl-CoA). Typically, these two genes are used as biomarkers 
for the identification/detection of butyrate-producing communi- 
ties ( 1 6, 1 7 ) . However, direct functional predictions based on gene 
homology alone can commonly result in misannotations if genes 
with distinct function share regions of high similarity, as specifi- 
cally described for both but and buk (17). Furthermore, CoA 
transferases show activity with several different substrate combi- 
nations in vitro (18), and alternative terminal CoA transferases 
were proposed for this pathway (19). Targeting the whole pathway 
for functional predictions is hence a robust way to circumvent 
difficulties associated with the analysis based on specific genes 
only. Additionally, there are other known butyrate-producing 
pathways, namely, the lysine, glutarate, and 4-aminobutyrate 
pathways, where amino acids serve as major substrates. These 
pathways are found in Firmicutes as well as other phyla, such as 
Fusobacteria and Bacteroidetes (20-22), but are traditionally ne- 
glected as potential butyrate-producing routes in enteric environ- 
ments. 

The availability of complete databases, including diverse can- 
didates and pathways, is essential to investigate specific microbial 
functions in complex microbial communities, to assess their ef- 
fects on the host, and to ultimately develop treatment strategies 
for functional dysbiosis. The aim of this study was to screen avail- 
able genomes, many from the Human Microbiome Project 
(HMP) framework, for potential butyrate producers and to char- 
acterize their phylogeny, gene arrangements, and gene phylogeny. 
The resulting gene catalogue was then used to screen for butyrate 
synthesis pathways in metagenomic HMP data to reveal this im- 
portant functional community within the healthy microbiota. 
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FIG 1 Four different pathways for butyrate synthesis and corresponding 
genes (protein names) are displayed. Major substrates are shown. Terminal 
genes are highlighted in red. L2Hgdh, 2-hydroxyglutarate dehydrogenase; Get, 
glutaconate CoA transferase (a, ji subunits); HgCoAd, 2-hydroxy-glutaryl- 
CoA dehydrogenase (a, ji, y subunits); Gcd, glutaconyl-CoA decarboxylase 
(a, ji subunits); Thl, thiolase; hbd, j8-hydroxybutyryl-CoA dehydrogenase; 
Cro, crotonase; Bed, butyryl-CoA dehydrogenase (including electron transfer 
protein a, ji subunits); KamA, lysine-2,3-aminomutase; KamD,E, /3-lysine- 
5,6-aminomutase (a, ji subunits); Kdd, 3,5-diaminohexanoate dehydroge- 
nase; Kce, 3-keto-5-aminohexanoate cleavage enzyme; Kal, 3-aminobutyryl- 
CoA ammonia lyase; AbfH, 4-hydroxybutyrate dehydrogenase; AbfD, 
4-hydroxybutyryl-CoA dehydratase; Isom, vinylacetyl-CoA 3,2-isomerase 
(same protein as AbfD): 4Hbt, butyryl-CoA:4-hydroxybutyrate CoA trans- 
ferase; But, butyryl-CoA:acetate CoA transferase; Ato, butyryl-CoA:acetoace- 
tate CoA transferase {a, ji subunits); Ptb, phosphate butyryltransferase; Buk, 
butyrate kinase. Cosubstrates for individual butyryl-CoA transferases are 
shown. 



RESULTS 

Overview of butyrate synthesis pathways. There are four 
main pathways known for butyrate production, the acetyl-CoA, 
glutarate, 4-aminobutyrate, and lysine pathways (Fig. 1). All path- 
ways merge at a central energy-generating step where crotonyl- 
CoA is transformed to butyryl-CoA, catalyzed by the butyryl-CoA 
dehydrogenase electron-transferring flavoprotein complex (Bcd- 
Etfa/3). The final conversion to butyrate is performed by various 
butyryl-CoA transferases that use cosubstrates either formed 
earlier in the individual pathways, namely, acetoacetate and 
4-hydroxybutyrate for the lysine and 4-aminobutyrate pathways, 
respectively, or from external sources, as shown for butyryl-CoA: 
acetate CoA transferase (But) (23). Other transferases not shown 
in Fig. 1 have been proposed as final enzymes as well ( 1 9 ) , and our 
data support those suggestions (see below) (Fig. 2). Alternatively, 
butyryl-CoA is phosphorylated and transformed to butyrate via 
butyrate kinase (Buk), leading to the formation of ATP. A small 
number of strains contain both But and Buk (see below) (Fig. 2). 
Since no possible cosubstrate for butyryl-CoA transferase is 
formed in the glutarate pathway, we considered But and Buk as the 
final enzymes for that pathway. 

Potential butyrate producers detected. Potential microbial 
functions are commonly inferred from isolates/sequenced ge- 
nomes and whole communities by targeting specific key genes that 



characterize the function. However, as mentioned above, such an 
approach can be problematic in the case of butyrate synthesis, and 
targeting complete pathways together with several downstream 
analyses is a more robust way to predict potential function and 
additionally can provide insights into potential substrate require- 
ments for functional performance. A detailed outline of the 
screening procedure is presented in Fig. SI and Text SI in the 
supplemental material. Briefly, hidden Markov models (HMM) 
together with EC number searches on the Integrated Microbial 
Genome (IMG) platform were used to detect potential genes 
among genomes, and results were subsequently evaluated based 
on their synteny among all pathway genes. A gene catalogue con- 
taining 3,055 entries from 225 organisms was established (see 
Data Set SI). We found the acetyl-CoA pathway to be present in 
the majority of potential butyrate producers. The lysine pathway 
was represented in many phyla as well, whereas the 
4-aminobutyrate- and glutarate-based pathways were the least 
abundant and were found in only four phyla (namely, Firmicutes, 
Fusobacteria, Spirochaetaceae, and Bacteroidetes). Several isolates 
exhibit genes for two or three pathways, indicating butyrate syn- 
thesis as having a central role in energy conservation. Figure 2 
displays all potential butyrate producers obtained, including 124 
strains with confirmed functional activity (based on species level). 
Candidate butyrate produces were isolated from distinct environ- 
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FIG 2 (Continued) 



ments and represent a broad taxonomic range associated with 10 
different phyla. An additional literature search for nonsequenced 
butyrate producers revealed that almost all families exhibiting 
butyrate-producing members, except some strains from Clostridi- 
ales incertae sedis XIII and the Synergistaceae (see Fig. S2), are 
included in our genome-based study. Hence, we consider our da- 
tabase to be a good representation of the known diversity of 
butyrate-producing bacteria. Pathway analysis of all strains from 
individual families confirmed earlier observations that the ability 



for butyrate production is not consistent within families. Not all 
members of the same family exhibit butyrate-producing pathways 
(see Fig. S3), demonstrating that phylogenetic analysis (on the 
family level) does not enable functional predictions. 

As expected, strains belonging to Firmicutes were identified as 
the major butyrate-producing group, exhibiting both demon- 
strated producers and potential candidates that span 18 different 
families. These strains were isolated from many environments and 
different host-associated sites. In this phylum, the acetyl-CoA 



FIG 2 A list of all obtained candidate bacteria and their taxonomic classifications. Firmicutes are shown in panel A, whereas candidates associated with other 
phyla are displayed in panel B. Names in bold represent known butyrate-producing strains. Origins of isolates (Isol.), where brown refers to human/animal- 
associated strains (individual body sites of isolation are as follows: GI, gastrointestinal tract; UG, urogenital tract; O, oral tract) and green to environmental 
isolates, are given. Individual pathways with corresponding final genes are shown, namely, the acetyl-CoA pathway (AceCoA; orange-yellow) and the glutarate 
pathway (Gltr; blue) with but (encoding butyryl-CoA:acetate CoA transferase; red; light pink represents "atypical" transferases) and buk (butyrate kinase; red), 
as well as the 4-aminobutyrate pathway (4-Amin; pink) with the 4Hbt gene (butyryl-CoA:4-hydroxybutyrate CoA transferase; red) and the lysine pathway (Lys; 
grey) with ato (encoding butyryl-CoA:acetoacetate CoA transferase). Results of synteny analysis for genes of individual pathways are indicated (see key to color 
patterns at the bottom) . Black cells in the column "Bcd-afi" represent the presence of the butyryl-CoA dehydrogenase electron transfer protein complex, i.e., bed 
is in synteny with the etf genes. Names in red indicate isolates that are reported to oxidize butyrate for growth. Actinob., Actinobacteria; Spro., Spirochaetes; The., 
Thermotogae; Bact, Bacteroidetes; C. Incertae Sedis, Chstridicdes incertae sedis. For more explanation, see the text. 
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pathway is dominant, genes are in good synteny, and the Bcd- 
ETFajS complex is well conserved (Fig. 2; see also Fig. S4 in the 
supplemental material). Whereas but and buk were identified as 
terminal genes in most candidates, some strains, especially the 
Erysipelotrichaceae, contain atypical transferases (Fig. 2). Only a 
few firmicute isolates exhibit other pathways. Notably, bacteria 
linked primarily to nonfermentative growth styles, namely, syn- 
trophic growth of Syntrophomonadaceae (24) and anaerobic res- 
piration, especially for the Peptococcaceae (25), were also detected 
where the acetyl-CoA pathway is used in a reverse direction to 
oxidize butyrate. Their gene sequences and arrangements are 
closely related to those of known butyrate producers (see below; 
see also Fig. S4), and all exhibit true terminal enzymes. 

The Fusobacteria display an interesting diverse pattern, where 
two strains, namely, Fusobacterium mortiferum and Ilyobacter 
polytropus, exhibit only the acetyl-CoA pathway (with but as the 
terminal gene), whereas the amino acid-fed pathways, glutarate 
and lysine, which are the only known route for butyrate produc- 
tion in Fusobacteria (20, 26), are most prominent in other strains. 
We detected genes from the acetyl-CoA pathway in those strains 
as well, but without synteny and absence of the terminal genes 
(Fig. 2). However, butyryl-CoA:4-hydroxybutyrate CoA trans- 
ferase (encoded by the 4-Hbt gene) was found in all strains, while 
additional genes from the 4-aminobutyrate pathway were often 
completely lacking. If the acetyl-CoA pathway is indeed perform- 
ing in those isolates, 4Hbt might take the role as the terminal 
transferase. 

Bacteroidetes, mainly represented by Porphyromonadaceae, ex- 
hibit three pathways with genes in good synteny. It was surprising 
to find the acetyl-CoA pathway in Porphyromonas species, since 
this taxon is considered asaccharolytic (27). Notably, this is in 
accordance with the observed gene arrangements, where this 
pathway is colocated with the lysine pathway in the same operon 
(see Fig. S4 in the supplemental material), and acetoacetate-CoA, 
formed during lysine fermentation, can be directly used as the 
substrate at the second step (Fig. 2). Accordingly, thiolase (Thl), 
the enzyme catalyzing the first reaction in the acetyl-CoA path- 
way, could not be detected in Porphyromonadaceae. This "cross- 
feeding" is probably occurring in all strains exhibiting these two 
pathways, since it allows for increased energy production via the 
Bcd-Etfa/3 complex and ferredoxin reduction. It should be noted 
that a final enzyme for that pathway is missing in this taxon, but 
terminal transferases linked to other pathways were detected. 

Our analysis suggests that several members of Actinobacteria 
and Thermotogae contain the lysine pathway for butyrate produc- 
tion. However, we are not aware of any described butyrate- 
producing member of those phyla, and culture-based experiments 
containing lysine as a nutrient source need to be performed to 
confirm them as real butyrate producers. 

Our gene and pathway analysis also revealed isolates of the 
phyla Chrysiogenetes, Deferribacteres, Proteobacteria, Spirochaetes, 
and Tenericutes as potential butyrate producers. However, only 
Spirochaetes contain confirmed butyrate-producing members, 
and genes linked to the acetyl-CoA pathway with but as the termi- 
nal gene were found in this taxon. Members of the family Trepo- 
nemaceae additionally exhibit the glutarate pathway. 

Detailed gene analysis. Detailed sequence analysis of aligned 
gene products from all candidates revealed several conserved sites 
for each gene (see Data Set S2 in the supplemental material). Sim- 
plified presentations of neighbor-joining trees of the individual 



genes (protein sequences) are displayed in Fig. 3. Based on the 
trees, horizontal gene transfer (HGT) signatures were detected, 
especially for the acetyl-CoA pathway, where genes from individ- 
ual Firmicutes families do not form homogenous groups but in- 
terrupt each other. Additionally, phylum-level HGT for members 
of Fusobacteria and Spirochaetes was observed (Fig. 3). Trees can 
be split into four major sections for this pathway; the first contains 
Eubacteriaceae, Lachnospiraceae, and Ruminococcaceae, disrupted 
by Fusobacteria and Spirochaetes, followed by Erysipelotrichaceae 
and members of Clostridiaceae. The second part consists of Clos- 
tridiales incertae sedis XI, Peptostreptococcaceae, and all members 
of Bacteroidetes. Strains belonging to Thermoanaerobacteriaceae 
and Clostridiaceae, mainly Clostridium botulinum, form the third 
cluster, whereas the bottom section consists of Proteobacteria and 
paralogous genes of Syntrophomonadaceae and Peptococcaceae. 
However, some exceptions to this overall trend exist where only 
one single thl gene cluster for all Clostridiaceae strains was detected 
and an additional tight group of crotonase genes linked to certain 
Lachnospiraceae is located outside the taxon's first section, indi- 
cating that they have evolved from different precursors than in 
other strains of Lachnospiraceae (Fig. 2). Interestingly, those genes 
are not in synteny with other genes from that pathway (see 
Fig. S4). Genes belonging to additional families of Firmicutes, such 
as the Veillonellaceae, did not display consistent patterns for this 
pathway. Peptococcaceae and Syntrophomonadaceae are clustering 
together close to known butyrate producers (except for j8-hy- 
droxybutyryl-CoA dehydrogenase [encoded by the hbd gene]). 
With only a few exceptions, all individual phyla form tight clusters 
within the other three pathways analyzed, indicating little HGT (at 
phylum level). The genes shared by all pathways, i.e., bed and the 
EtfajS genes, did not display consistent patterns associated with a 
specific pathway (data not shown). 

Terminal genes displayed patterns similar to those of other 
genes of the corresponding pathways, indicating that all genes of 
an individual pathway coevolved in many of the strains analyzed. 
Thus, overall, our results suggest that specific types of transferases 
are indeed associated with a certain pathway. However, the acetyl - 
CoA pathway, especially, shows exceptions, where alternative 
transferases were found in several isolates (Fig. 2) (19), and gene 
arrangement analysis indicated that transferases linked to other 
pathways might catalyze the final step to butyrate in certain iso- 
lates (e.g., see Fusobacteria or Porphyromonas). Several paralogous 
genes were detected for both buk, associated with C. botulinum 
and Clostridium difficile, and but, derived mainly from Lachno- 
spiraceae, Syntrophomonadaceae, and Veillonellaceae, at the bot- 
toms of individual trees (Fig. 3). 

Metagenomic analysis. Figure 4 displays the overall butyro- 
genic potential of 15 stool samples of healthy individuals provided 
by the HMP. High percentages of genomes were calculated to 
exhibit a pathway (median, 19.1%; range, 3.2% to 39.4%), where 
the acetyl-CoA pathway was dominating for almost all individuals 
(mean, 79.7%; range, 46% to 97.5% of all pathways), followed by 
the lysine pathway, which showed large variations between sam- 
ples (mean, 11.2%;, range, 0.5% to 49.7% of all pathways) and 
was especially highly abundant (>35%) for four individuals. The 
glutarate and 4-aminobutyrate pathways were consistently de- 
tected at low abundances (mean, 2.5%; range, 0.8% to 9.6%; and 
mean, 2.4%; range, 0.3% to 10.5% of all pathways, respectively). 
The overall butyrogenic potential was estimated as the sum of all 
detected pathways. Notably, amino acid-fed pathways did not of- 
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FIG 3 Simplified representations of neighbor-joining trees of individual genes (protein sequences) are shown. The left column in each tree shows arrangement 
of different genes associated with different families within the phylum Firmicutes, whereas gene entries linked to other phyla are given in the right column. For 
a key to colors see the bottom. Letters (A to D) represent the four distinct regions of individual trees based on genes of the acetyl-CoA pathway, and "* " marks 
deviations from the overall trend. For an explanation, see the text. 
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FIG 4 Abundance of butyrate-producing pathways (calculated as a percent- 
age of total bacterial genomes theoretically exhibiting a pathway) in metag- 
enomic data from stool samples of 15 healthy humans is shown. Different 
colors represent individual pathways (acetyl-CoA pathway, orange; glutarate 
pathway, blue; 4-aminobutyrate pathway, pink; lysine pathway, grey) . The box 
plot displays the data distribution for all 15 samples analyzed (A to O). 

ten occur in genomes alone but usually occurred together with the 
acetyl-CoA pathway (Fig. 2), and the summarized cumulative re- 
sults presented in Fig. 4 are hence likely an overestimate. 

Detailed analysis revealed a broad diversity of butyrate- 
producing-pathway genes for individual samples, where almost all 
detected groups are associated with known butyrate producers 
(Fig. 5). Interestingly, a few key groups dominated for most indi- 
viduals, suggesting a butyrate-producing taxonomic core in 
healthy colons. This core consisted of groups associated with 
known butyrate producers linked to specific Lachnospiraceae and 
Ruminococcaceae for the acetyl-CoA and glutarate pathways. 
Groups linked to Odoribacter splanchnicus and Alistipes putredinis 
(both members of the Bacteroidetes) dominate the lysine pathway, 
whereas groups similar to O. splanchnicus and Clostridium sym- 
biosum prevailed in the 4-aminobutyrate pathway. These results 
indicate that butyrate production is not associated solely with 
members of the phylum Firmicutes and suggest that the Bacte- 
roidetes are often contributing to the overall butyrogenic potential 
as well. However, current knowledge of the Bacteroidetes suggests 
that most carbon consumed does not result in butyrate produc- 
tion; hence, metabolic flux studies, under various nutritional con- 
ditions, are needed to quantify the contribution of this taxon to 
the butyrate pool. Obtained read abundances were relatively con- 
sistent for all genes of a pathway in an individual group (see Fig. S5 
in the supplemental material). Furthermore, the degree of expla- 
nation was high, i.e., the amount of reads that matched any gene in 
our database, which were subsequently also included in diversity 
analysis, where all genes of a pathway of an individual group had 
to be detected in order to be considered (see Materials and Meth- 
ods). However, especially for the lysine pathway, the detected 
genes of the entire pathway were occasionally split between differ- 
ent groups, i.e., no group was positive for all genes of that pathway, 
which inhibited diversity analysis for some samples (not for those 
exhibiting an overall high abundance of this pathway) (Fig. 5). 

but and buk were the dominating terminal genes in most sam- 
ples for the acetyl-CoA pathway, with median abundances of 
77.2% and 21.8%, respectively (see Fig. S6 in the supplemental 
material). Alternative transferases were detected only at very low 
abundances, suggesting that those enzymes do not play an impor- 



tant role for butyrate synthesis in healthy humans. Although but is 
the most prevalent terminal gene in our metagenomic data (me- 
dian, 61.8%; range: 24.7% to 85.1% [considering all pathways] ), it 
represents only one terminal point of the butyrate-producing 
pathways, and studies targeting only but for total functional anal- 
ysis should be aware of this limitation. 

DISCUSSION 

The established gene catalogue together with our metagenomic 
analysis allowed us to reveal microbial butyrate-producing com- 
munities in the healthy microbiota and their associated metabolic 
pathways. This metabolic framework is a critical step in investi- 
gating the role of this function in host health and disease. Al- 
though targeting complete pathways is a more robust way to pre- 
dict function than single-gene analysis, their detection in genomes 
does not automatically imply functionality, since that must be 
done by specific biochemical testing. For several isolates, such as 
members of Peptococcaceae and Syntrophomonaceae, the detected 
ability to produce butyrate is doubtful, since they are known 
rather to oxidize butyrate for growth (see reference 28). This is 
also true for the majority of the Proteobacteria shown in Fig. 2, 
which belong to the delta class, that use anaerobic respiration for 
energy conservation, and butyrate consumption is documented 
for several isolates (e.g., see reference 29). In these taxa, pathway 
genes are often not in synteny and only distantly related to genes of 
confirmed butyrate producers (Fig. 3), and terminal genes are 
missing in many strains. However, it cannot be excluded that cer- 
tain environmental conditions, such as the absence of H 2 - 
consuming bacteria or lack of appropriate inorganic electron ac- 
ceptors, might trigger fermentative growth and the synthesis of 
butyrate in certain isolates. Furthermore, a few strains are known 
to generate butyrate as building blocks for secondary metabolites, 
such as salinosporamide B, produced by the actinobacterium 
Salinispora tropica (30). 

Neighbor-joining trees revealed very consistent patterns for all 
genes of an individual pathway, indicating a high degree of coevo- 
lution. Nevertheless, clear HGT signatures were detected in iso- 
lates, especially for the acetyl-CoA pathway, confirming earlier 
findings (31). However, our results indicate transfer of entire 
pathways rather than of single genes. The fast microbial turnover 
and enormous selective pressures in the colonic environment pro- 
mote large-scale HGT (32). Since the acetyl-CoA pathway was 
detected to be the dominant pathway, displaying the greatest di- 
versity, observations of HGT signatures specifically for this path- 
way make sense. Furthermore, our metagenomic results also did 
not detect unknown "disconnected HGT" events, i.e., bacteria 
that acquired genes of the acetyl-CoA pathway from distinct pre- 
cursors (representing unknown gene combinations). This sup- 
ports the observed coevolutionary behavior of all genes in this 
pathway. However, for the lysine pathway, the presence of gene 
combinations that have not yet been captured in sequenced iso- 
lates was indicated. 

Diet is a major external force shaping gut communities (33). 
Good reviews of studies investigating the influence of diet on 
butyrate-producing bacteria exist (11 and 34) and suggest that 
plant-derived polysaccharides such as starch and xylan, as well as 
cross-feeding mechanisms with lactate-producing bacteria, are 
the main factors governing their growth. Our metagenomic anal- 
ysis supports the acetyl-CoA pathway as the main pathway for 
butyrate production in healthy individuals (Fig. 4), implying that 
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FIG 5 The detected diversity in metagenomic data associated with individual pathways. Colors correspond to different pathways (acetyl-CoA pathway, orange; 
glutarate pathway, blue; 4-aminobutyrate pathway, pink; lysine pathway, grey) . Bacterial names represent members of individual groups (based on 10% complete 
linkage clustering; for details, see Materials and Methods) . Groups consist of the following: (i) only one reference genome (indicated by single strain names), (ii) 
merged strains of the same species (indicated by species name without strain information), and (iii) merged genomes from distinct species (individual names are 
given). The group "Fusobacteria several strains" consists of the following strains: Fusobacterium nucleatum subsp. nucleatum, Fusobacterium nudeatum subsp. 
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a sufficient polysaccharide supply is probably sustaining a well- 
functioning butyrate-producing community, at least in these 
North American subjects. However, the detection of additional 
amino acid-fed pathways, especially the lysine pathway, indicates 
that proteins could also play an important role in butyrate synthe- 
sis and suggests some flexibility of the microbiota to adapt to 
various nutritional conditions maintaining butyrate synthesis. 
Whether the prevalence of amino acid-fed pathway is associated 
with a protein-rich diet still needs to be assessed. It should be 
noted that those pathways are not restricted to single substrates, as 
displayed in Fig. 1, i.e., glutarate and lysine, but additional amino 
acids, such as aspartate, can be converted to butyrate via those 
routes as well (26). Furthermore, the acetyl-CoA pathway also can 
be supplied with substrates derived from proteins either by cross- 
feeding with the lysine pathway (as discussed above) or by direct 
fermentation of amino acids to acetyl-CoA (35). However, 
whereas diet-derived proteins are probably important for butyrate 
synthesis in the ileum, where epithelial cells use butyrate as a main 
energy source as well (36), it still needs to be assessed whether 
enough proteins reach the human colon to serve as a major nutri- 
ent source for microorganisms. Another possible colonic protein 
source could originate with lysed bacterial cells. Enormous viral 
loads have been detected in this environment, suggesting fast cell/ 
nutrient turnover, which might explain the presence of corre- 
sponding pathways in both fecal isolates and metagenomic data 
(Fig. 1, 4, and 5). Detailed investigations of butyrate-producing 
communities in the colon of carnivorous animals will add addi- 
tional key information on the role of proteins in butyrate produc- 
tion in that environment. It should be noted that diet provides 
only a part of the energy/carbon sources for microbial growth in 
the colon, since host-derived mucus glycans serve as an important 
nutrient source as well. Several butyrate-producing organisms do 
specifically colonize mucus (37), and for some, growth on mucus- 
derived substrates was shown (38). 

Systems biology together with metabolic modeling is a prom- 
ising approach to handle complexities of nutrient fluxes within 
the gut microbiota and will eventually help in predicting func- 
tional performance (39). This study provides an important step 
forward, since it enabled us to assess the butyrate-producing po- 
tential of complex microbial communities, including predictions 
of basic nutritional requirements for butyrate synthesis. However, 
next to substrate availability, additional factors, such as pH, were 
demonstrated to be important factors governing the successful 
competition of butyrate producers with other intestinal organ- 
isms (11). Furthermore, the presence of butyrate-producing path- 
ways alone might not allow optimal predictions of actual butyrate 
production, since the organisms involved show metabolic flexibil- 
ity and diverse profiles of fermentation products. Butyrate synthe- 
sis was shown to be influenced by several factors, such as type of 
limiting substrate and growth rate (40), oxygen concentration 
(41), and growth style (attached versus unattached [42] ). Further- 
more, both the presence of inorganic electron acceptors promot- 
ing anaerobic respiration and aceto-/methanogenesis lowering 
the H 2 partial pressure can lead to more oxidized fermentation 
products, especially acetate, at the expense of more reduced sub- 
stances, such as butyrate (40). Our metagenomic approach, in 
combination with additional "-omics"-based technologies, will 
help to improve functional predictions and to assess the resulting 
effects on the host. 



MATERIALS AND METHODS 

Establishing the gene catalogue. Individual pathways shown in Fig. 1 are 
based on KEGG with modifications. Most importantly, the entire lysine 
pathway and certain steps in the 4-aminobutyrate pathway are not present 
in KEGG and were included based on references 22 and 43. KEGG addi- 
tionally displays the conversion from butanol to butyrate, which was not 
included in this study. Furthermore, a possible route from acetoacetate via 
poly-jS-hydroxybutyrate and crotonoyl-CoA to butyrate is suggested in 
KEGG. However, this pathway contains an unlikely reverse reaction of 
extracellular poly-)3-hydroxybutyrate degradation enzymes that differ 
considerably from intracellular depolymerases (44), and this route was 
hence not considered. The stereospecific separation between 
R-hydroxybutyrate and S-hydroxybutyrate in the acetyl-CoA pathway 
was omitted, and the two routes were merged. 

Screening of genomes was divided into two main parts, where the first 
was based on EC number searches (from KEGG) within the Integrated 
Microbial Genome (IMG) (http://img.jgi.doe.gov) database and the sec- 
ond part used HMM models (both approaches were applied on a protein 
level). A detailed schematic representation of the work flow and abun- 
dance of obtained candidates (and associated genes) at each step is given 
in Fig. S 1 in the supplemental material. First, all genes matching individ- 
ual EC numbers were obtained, and the data were queried for all candi- 
dates exhibiting all genes of a specific pathway. Since several model bu- 
tyrate producers failed the query, we allowed for one missing gene in each 
pathway. Candidates were then subjected to synteny analysis (see Fig. SI 
and Text SI in the supplemental material). Since it was proposed that 
several different gene products are able to catalyze the final step in the 
acetyl-CoA pathway and their location is often apart from other genes in 
this pathway, we excluded the terminal enzymes here and treated them in 
separate analyses. After these first steps, we harvested genes from model 
butyrate producers and candidate strains displaying all genes of the indi- 
vidual pathway in close synteny (not considering terminal genes) and 
used the obtained sequences to construct HMM models to screen ge- 
nomes again. After applying certain cutoffs based on HMM scores (for 
details, see Fig. SI and Text SI), candidates were filtered for exhibiting 
entire pathways (allowing one missing gene), and terminal genes were 
treated in separate analyses (for details, see Fig. SI and Text SI). Finally, 
candidates from both EC number and HMM searches were combined and 
subjected to additional filtering based on detailed gene analysis consider- 
ing synteny and phylogenetic trees (for details, see Fig. SI and Text SI). 
Protein sequences were aligned in the software program Clustal Omega 
(http://www.ebi.ac.uk/Tools/msa/clustalo), and neighbor-joining trees 
were constructed using the program MEGA (http://www.megasoftware 
.net) . Taxonomy is displayed as provided by IMG with some modifica- 
tions for the phylum Firmkutes based on RDP's classifications. 

Analysis of metagenomic data. Stool samples from 1 5 different indi- 
viduals were randomly selected from the HMP Data Analysis and Coor- 
dination Center (http://www.hmpdacc.org; parameters defining health 
can be obtained from the website). Raw nucleotide read sequences were 
aligned (blastn) against our database, requiring a minimum alignment 
length of 70 bp and sequence identity of >80%. Only the best-scoring 
alignment (lowest E value) was used for further analysis. The abundance 
of individual butyrate-producing pathways (Fig. 4) was calculated as 
foUows: (i) (#reads tot X lengthp athway )/4 X 10 6 bp = th 100% , and (ii) 
#reads pathway /th 1000/o = result (genomes exhibiting pathway [%]), where 
#reads tot is the total number of reads for a sample, length pathway stands for 
the total length (bp) of all unique pathway genes (calculated from the 
median length of all entries in the database for a specific gene), 4 X 10 6 bp 
corresponds to an average genome size, th 100% is the theoretical number 
of reads if all genomes exhibit the pathway, and #reads path corresponds 
to the number of reads matching the pathway (BLAST result). Detailed 
results are presented in Fig. S7 in the supplemental material. 

Prior to diversity analysis, individual genes from the database were 
subjected to multiple complete linkage clustering (using the Pyrosequenc- 
ing Pipeline provided by the Ribosomal Database Project; http://rdp.cme 
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.msu.edu) on the nucleotide level, applying a 10% cutoff. All genes of an 
individual pathway clustered very similarly (clusters for all individual 
pathway genes were usually associated with the same genomes), allowing 
us to group individual clusters of all genes of a specific pathway together. 
Thus, obtained groups contained all genes of a specific pathway. If cluster 
results varied between genes (e.g., all thl genes from three candidates clus- 
ter together, whereas two clusters were generated for the hbd gene), then 
clusters were manually merged (e.g., merging of all three hbd genes as 
associated thl genes) to achieve consistency, and the most conservative 
approach was always applied, i.e., clusters were only merged and never 
split. Genes of the same strain were always merged. For metagenomic 
analysis, a specific group (e.g., the group Faecalibacterium prausnitzii for 
the acetyl-CoA pathway consists of all pathway genes from all five strains 
of this taxon) was considered present only if all pathway genes could be 
identified for that group in the BLAST result (thus, BLAST hits did not 
have to match all genes from the same strain but only from the same 
group — an example [sample A] is shown in Fig. S5 in the supplemental 
material). Results presented in Fig. 5 are a median value for all individual 
pathway genes (see Fig. S5). The degree of explanation was calculated as 
the percentage of reads matching groups that were included in the diver- 
sity analysis (average from individual genes) from the total number of 
reads matching any gene in the database. 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org 
/lookup/suppl/doi:10.1128/mBio.00889-14/-/DCSupplemental. 

Text SI, DOCX file, 0.2 MB. 

Figure SI, TIF file, 0.3 MB. 

Figure S2, EPS file, 0.4 MB. 

Figure S3, EPS file, 0.3 MB. 

Figure S4, PDF file, 3.2 MB. 

Figure S5, PDF file, 0.2 MB. 

Figure S6, EPS file, 0.2 MB. 

Figure S7, EPS file, 0.6 MB. 

Data Set SI, XLSX file, 0.1 MB. 

Data Set S2, PDF file, 0.2 MB. 
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